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Abstract 

In various industrial contexts, estimating the distribution of unob¬ 
served random vectors Xi from some noisy indirect observations H{Xi) + 

Ui is required. If the relation between Xi and the quantity H{Xi), mea¬ 
sured with the error Ui, is implemented by a CPU-consuming computer 
model H, a major practical difficulty is to perform the statistical infer¬ 
ence with a relatively small number of runs of H. Following Fu et al. 
a Bayesian statistical framework is considered to make use of possible 
prior knowledge on the parameters of the distribution of the Xi, which 
is assumed Gaussian. Moreover, a Markov Chain Monte Carlo (MCMC) 
algorithm is carried out to estimate their posterior distribution by replac¬ 
ing H by a kriging metamodel build from a limited number of simulated 
experiments. Two heuristics, involving two different criteria to be opti¬ 
mized, are proposed to sequentially design these computer experiments 
in the limits of a given computational budget. The first criterion is a 
Weighted Integrated Mean Square Error (WIMSE) [^. The second one, 
called Expected Conditional Divergence (ECD), developed in the spirit 
of the Stepwise Uncertainty Reduction (SUR) criterion [411 H], is based 
on the discrepancy between two consecutive approximations of the target 
posterior distribution. Several numerical comparisons conducted over a 
toy example then a motivating real case-study show that such adaptive de¬ 
signs can significantly outperform the classical choice of a maximin Latin 
Hypercube Design (LHD) of experiments. Dealing with a major con¬ 
cern in hydraulic engineering, a particular emphasis is placed upon the 
prior elicitation of the case-study, highlighting the overall feasibility of the 
methodology. Faster convergences and manageability considerations lead 
to recommend the use of the ECD criterion in practical applications. 

Keywords: Inverse statistical problems; Bayesian inference; Kriging; Adaptive 
design of experiments; Metropolis-Hasting-within-Gibbs algorithm; Prior elicitation. 


1 Introduction 

In many industrial problems, engineers have to deal with uncertain quantities 
which cannot be directly measured. Moreover, some of them can suffer from 
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some inherent variability. For instance, in hydraulics, the assessment of a risk 
of flooding usually depends on some quantities, called coefficients of Manning- 
Strickler, which represent the roughness of the river bed. Because rivers are 
complex changeable systems, it appears reasonable to consider these coefficients 
as random variables. Although they cannot be directly measured, it appears 
possible to estimate their randomness from flooding data by means of computer 
simulation. 

Estimating the probability distribution of such random unobserved variables 
involves some observations Yi G (e.g., water levels), i = 1.. .n, and a com¬ 
puter model H (e.g.. Saint-Venant equation solver) which links the unobserved 
variables of interest Xi G (e.g., Manning-Strickler coefficients) to the 1): 


Y, = H{X,,d{) + U^ (1) 

where di G stands for some known or observed quantities and where Ui rep¬ 
resents some unobserved measurement errors. A Gaussian framework is adopted 
in this article: the {Xi Ui)’^ are assumed to be independent Gaussian random 
vectors such that 



where A4(/i, S) is the fc-dimensional Gaussian distribution of mean /r and co- 
variance matrix S. The issue is then to estimate the unknown parameters 
9 = (to, C) of the probability distribution of the Xi from some held data {yi, di)]^ 
1 < j < n, given H and the error covariances R. The accuracy of measurements 
is generally given or can be assessed: the assumption that R is known is a sound 
basis for the inference, since it prevents from a problem of non-identiflability of 
(d,i?). 

From the general perspective of the analysis of some independent measure¬ 
ments yi performed on similar systems (under conditions di), this statistical 
model enables to capture the inherent variability of some variables Xi in the 
population which is studied. For instance, mechanical tests generally involve 
a production-lot population of components whose precise characteristics {e.g. 
Young’s Modulus or thermal expansion coefficient) suffer from a non negligible 
variability. 

The major practical obstacle to the estimation of 9 is the GPU cost and 
time needed to evaluate H{x,d), given an input {x,d) G {Q = d + 92 )- In 
hydraulics, one run of H takes typically few hours per GPU. Several methods 
have been developed to tackle this difficulty. Geleux et al. m considered a 
maximum likelihood estimation by Expectation-Gonditional Maximisation Ei¬ 
ther (EGME) PI] based on an iterative linearisation of H: this algorithm should 
be avoided if the nonlinearities of H relative to x are significant, otherwise it 
can be very efficient. Barbillon et al. [2] proposed to couple a Stochastic Expec¬ 
tation Maximisation (SEM) algorithm [TD] with a kriging metamodelling of H 
to improve the robustness of the estimation. 

Kriging, also known as Gaussian Process (GP) regression, was suggested by 
Sacks et al. m to deal with GPU-expensive computer models. The purpose 

^ Where yi is a realization of the random vector Yi. 
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of this metamodelling technique is to build an accurate surrogate model of H 
from some computer experiments (some runs of H). Then a crucial question is 
how to determine the Design of these Experiments (DoE). Several methods of 
calibration of computer models relying on kriging were proposed by Kennedy 
and O’Hagan m and Bayarri et al. [3]. Although their statistical models are 
close to the one postulated here, an important difference is that the Xi are 
assumed to be random in this article, whereas the unknown inputs x are part 
of the parameters 9 to estimate in their studies. 

Hereafter, the Bayesian framework suggested by Fu et al. m, which involves 
a kriging of H, is considered. It allows to take account of prior information about 
the Xi (which could possibly arise from expert or past assessments) through 
the definition of a so-called prior probability distribution for 9, the density of 
which being denoted tt{9). A Metropolis-Hastings-within-Gibbs Markov Chain 
Monte-Carlo (MCMC) sampling can then be carried out to estimate the poste¬ 
rior distribution of 9 (given the field data). The benefit of kriging is twofold: 
extensive sampling gets feasible, and the uncertainty about H can be accounted 
for by embedding the CP into the statistical model and the MCMC procedure. 

From a Bayesian point of view, there is no reason to drop the uncertainty 
on H by only keeping the kriging predictor H{.). Besides, the development of 
purpose-oriented adaptive DoE approaches, such as the Stepwise Uncertainty 
Reduction (SUR) [TTl 15]. is then made possible. Such approaches seek a trade¬ 
off between shrinking the uncertainty on H (which is measured by the kriging 
covariance) as much as possible, and exploring the most interesting areas of the 
input space of H regarding the considered objective. A classical example comes 
from the field of global optimisation where the Expected Improvement criteria 
was proposed by Mockus et al. m, Jones et al. m- The purpose of this article 
is to contribute to the definition of efficient adaptive DoE algorithms for solving 
the inverse statistical problem specified earlier. 

The article is organised as follows. Section gives details about kriging 
metamodelling, as well as the maximin Latin Hypercube Design (maximin LHD) 
which provides us with a first knowledge about the computer model H (before 
starting a purpose-oriented exploration of the input space of H). In Section 
the method used to specify an informative prior 7r(0), then the inference by 
MCMC, are described. Afterwards, two methods, called Expected Conditional 
Divergence (ECD) and Weighted Integrated Mean Square Error (WIMSE), 
which derive from two purpose-oriented criteria to optimise, are proposed to 
sequentially enrich the DoE in Section]^ and Section]^ Numerical studies are 
conducted on on toy example in Section to compare the efficiency of these 
approaches with a posterior approximation standed on a static space-filling de¬ 
sign (maximin LHD). Finally, the full methodology is run through over a real 
hydraulic computer model: its input roughness parameters are calibrated from 
noisy observations of water levels. Section concludes the article by giving 
major directions for further work. 
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2 Kriging and maximin LHD space-filling de¬ 
sign 

This section recall some basics of kriging and of designing computer experiments 
which matter for the remainder of the article. 

2.1 Kriging 

Kriging is a geostatistical method m which was suggested by Sacks et al. m to 
build a cheap surrogate model of a computer model, from a limited of runs of the 
latter, over a hypercube ft C . This method has known a growing interest in 
metamodelling with the writings of Koehler and Owen [^, Stein [35], Kennedy 
and O’Hagan (TB], Santner et al. [33], amongst others. In this section, a scalar 
function ft, : O —>■ M is considered: in the case of a vector-valued function 
ift : O —>■ each component hi{.) of H{.) can be “kriged” independently from 

the others, as done in the numerical experiments in Section 

A usual manner to present kriging is starting from the premise that the 
considered function ft(.) is a particular realization of an underlying GP 'H(.): 

Vz e O, Hiz) = F{z) 13 + G{z), (3) 

where ,5 is a vector of where F{z) = (/i(z) • • • /ic(z)) with fk : 

1 < fc < K, a family of linearly independent functions, and where C/ is a 
centered GP (E[^(z)] = 0, for all z G Gl). The GP hypothesis means that 
(Gizi) ••• G{zk)f is a fc-dimensional Gaussian vector for any set {zi, ■ ■ ■ , z^} S 
n and any ft > 1. Although it may appear artificial, this assumption leads to a 
flexible statistical model which has been applied successfully in many contexts, 
and, in a Bayesian perspective, it can be interpreted as the definition of a prior 
on ft m- For any {z,w) G the mean function p. : —> M of H is defined 

by fj,{z) = E \H{z)] = F^ (3, and the covariance function AT : —> K of ^ (and 

73) by Gov [^(z), CJ(w)] = K{z,w). In the following, as most often assumed by 
authors when modelling computer models, G is stationary, thus K{z,w) only 
depends on z — w: K{z,w) = K{z — w) by abuse of notation, with Ar(0) = 1. 

Let £)jv = {zi,-- - jZArjcflbea DoE associated to observations ft at G , 
and = ('H(zi)---'H(z 7 v)), then, from a direct application of a classical 

theorem relative to the conditioning of Gaussian vectors, the process H condi¬ 
tioned by the observations, that is 'H\'HN = hN, is still a GP over Gt with mean 
function hdk : —> K and covariance function —> K. Namely, for all 

{z,w), 


ftAr(z) ~ Af{fiDiviz),KDj^iz,-)) (4) 

where 

f^D^iz) = F{z) [3 + Kz^NKN^N~^{hN — FN /3) (5) 

Kdm{z,w) = K{z,w) - (6) 

with = (Ar(z, zi) • • • K{z, zn) ) (idem for and = K{zi, zj). 

Of course, Kd^{z,z) < K{z,z): the more observations are available, the less 
uncertainty on 73 remains. If K{.,.) and /3 are known, then fi^j^iz) is the kriging 
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predictor of H(z) \'Hn = ^n, that is its Best Linear Unbiaised Predictor (BLUP), 
and (z, w) is the kriging covariance, that is the covariance function of the 
error of prediction. In particular, Kd]^{z,z) is the Mean Square Error (MSE) 
of the BLUP at z. 

Furthermore, if iL(., .) is known but (3 unknown (universal kriging), then the 
generalised least-square estimator 

P = {Fn^ Km^mFn) (7) 

of P is also the maximum likelihood estimator, and the BLUP ho^ of ^{{z) jsf 
hjsf is obtained by substituting /3 by /3 in Equation Last but not least, the 
kriging covariance which is associated to is then 

Kdm{z,w) = K{z,w) — Kz,nKn,n~^Kw,n'^ 

+ iF{z) — Fn^{Fn'^Kn,n~^Fn) ( 8 ) 

X iF{w) - 

(we use the same notation as before - P known - for the sake of simplicity) 
with [EjvJij = fjizi). It can be seen as an approximation of the covari¬ 
ance function of the GP T-LIT-Ln = and, together with is gener¬ 

ally used, for example in [^, to model the uncertainty on the Gaussian vector 
{^H{wi) ■ ■ ■'H{wm) ) \'HN = hN for any set {rci, • • • , wm} C U; see also Santner 
et al. [^, Bachoc [T] for other precisions. Following Fu et al. [T^, Kdj^{z,w) is 
used in the MGMG procedure to account for the dependence between the miss¬ 
ing data Xi due to the uncertainty on (each component hi{.) of) the computer 
model F[{.)- see Section for more details. The induced Mean Square Error 
MSEjj^ : z I—> Ko^{z, z) plays an important role in Section]^ 

In practical applications, Lf(.,.) is unknown and is estimated, thanks to a 
model 7f^(.,.) parametrised hy ip ^ by different techniques such as max¬ 
imum likelihood (as hereafter) or cross-validation. In the remainder of this 
article, the plug-in estimates obtained by replacing K{.,.) by with ip 

the estimator of ip, are employed. 


2.2 Design of experiments (maxzmm-Latin Hypercubic De¬ 
signs) 

Obviously, the predicting accuracy of kriging highly depends on the DoF D^- 
Following Picheny et al. m, it is possible to distinguish three kinds of DoFs: 

• space-filling designs, which aim to fill the input space with a finite number 
of points independently of the considered model (e.g., maximin-UlD); 


model-oriented designs, which attempt to build a suited DoF accounting 
for the features of the model H or the metamodel (e.g. IMSE, see Sec¬ 
tion 5.1); 


• purpose-oriented designs, which account for the final aim of the study to 
find the best adapted DoF (e.g., to compute an exceedance probability by 
accelerated Monte Garlo methods). 
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In this article, a purpose-oriented DoE is built in an adaptive way. A first 
calibration of the covariance parameters is performed from an initial maximin- 
LHD, then the DoE is sequentially improved using sequential strategies, which 
are detailed in sections and The concept of LHDs was introduced in EH; 
such designs ensure a good coverage of the interval to which each scalar variable 
belongs. Then m proposed the maximin distance criterion to optimize LHDs. 
Maximin means maximizing the minimum inter-site distance between the set of 
N points: 


Therefore, the maximin criterion prevents the points of the design to be close to 
each other. In the present work, maximin-UIDs are obtained by the algorithm 
of Morris and Mitchell [55] . 


3 Bayesian statement and inference 

3.1 Prior elicitation 

In the Bayesian statistical framework favored in m, a Gaussian-Inverse Wishart 
prior distribution was elicited: 


m IC ~ Mq{p,,C/a), 

C ~ IW,(A,zz). 


(9) 

( 10 ) 


This prior can be assimilated to the posterior distribution of virtual data given 
a noninformative prior, which presents some advantages in subjective Bayesian 
analysis [5]. Especially, a clear sense can be given to hyperparameters (/r, a. A, v), 
which simplifies prior calibration. 

Indeed, a can be understood as the size of virtual sample of data X, that 
modulates the strength of the practicioner’s belief in prior information (for in¬ 
stance provided by subjective experts). It should be calibrated under the con¬ 
straint a < n to ensure that the posterior behavior is mainly driven by objective 
data information. A default (let say, “objective”) choice is a = 1. 

Furthermore, fi is the prior predictive mean, median and most probable value 
of X, which can be estimated by a measure of central tendency provided by past 
calibration results in close situations. In the motivating case-study explored in 
Section]^ such information was found by bibliographical researches (Table [^. 

Finally, denoting X,Y the set of missing and truly observed data, the 
reparametrizations A = (a 1) ■ Ce and v = a q 2 imply that the con¬ 
ditional posterior distribution of C given m is the Inverse Wishart distribution 

iw(^{a+ I) Ce + {n+l)Cn, v + n+ I^ with ^ - Xi){m - XiY, 

the expectation of which being 


E[C'|m,X,Y] 


a -I-1 ^ n-\-l 

a -\- n -\- 2 a n 2 


This last expression highlights the meaning and influence of a as a virtual size. 
The components of Ce are to be calibrated in function of prior knowledge on X 
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too, expressed through its predictive prior distribution, which is a decentered 
Student law: 

^ ~ Sti/r, a + s) 

V a(a + 3) / 

with mean vector /i and covariance matrix ^^Ce- Again, in the case-study 
that motivated this work, prior information on the ratio between average values 
and standard deviation of Strickler-Manning coefficients was available (Figure 
[^, which allowed for a full prior calibration (Section]^. 

3.2 Posterior computation 

A Gibbs sampler [37] was proposed to compute the posterior distribution of 
9 = (m,C). Actually, replacing the expensive-to-compute function H with a 
kriging emulator H, as in Barbillon et al. [5], and introducing a new emulator 
error MSE, the Gibbs sampler can be adapted as follows: 

Gibbs sampler (at the (r’-l-l)-th iteration) —-— 

Given for r = 0,1,2,..., generate: 


1. - + -f _ 

fi)', v + n + l), 


2 . 

3. 


,(’'+!) I 


A/'l -^u+ 

I n+o ,' n+a 


(r) 


, where X^T^ = n ^ 

’ n+a I ''' / —1 i ’ 


X("+i)| • • • oc |R-tMSE("+i)|-5.exp \ -m<-^++' 


,0'+i))_ 1 ((y;i_^(;+i))'_ _ _ ^ (r+MSE(-+i)) ’ 




(X 


(r+ 1 ) 


/ Al - 


\ V _ w(’'+F 


where and = MSE(X*'’'^^\d) is the block 

diagonal matrix 


MSE(X‘''+^\d) 


MSEi(X(’"+i\d) 


MSEp(X("+d,d) 


} n lines 
} n lines 


In the third step, the variance matrices MSEj(X(’’+^\ d) G are defined 

by 


MSE^-(X(’'+b,d) 


E (^(H,(X(’-+b,d) -i7,(X(’'+b^d))' IHri,,) , 
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for j = 1,... ,p, where Hj denotes the j-th dimension of the Gaussian process 
T-L. Moreover, 



/ Ri 


} n lines 

f Ru 

« \ 

R = 

1 0 

Rp y 

, with Ri = 

} n lines 

{ 0 

Rii j 


where Ru is the th diagonal component of the diagonal variance matrix R. It 
is worth noting that this third conditional distribution does not belong to any 
closed form family of distributions. Therefore a Metropolis-Hastings (MH) step 
is used to simulate (see Appendix A). 

As discussed in [T^ , the use of the MCMC algorithms involves many possible 
errors. According to experimental trials, the accuracy of the metamodel plays 
a critical role in the the estimation problem. MCMC algorithms can produce 
Markov chains converging towards the desired posterior distribution. However, 
if the function H is really badly approximated, apart from the algorithmic error 
introduced by the MCMC algorithm, the result can also suffer from an emulator 
error. 


4 The Expected Conditional Divergence crite¬ 
rion for adaptive designs 

The two following sections address the issue of building adaptive designs of ex¬ 
periments, by proposing two strategies. In this section, a criterion called ECD 
(Expected Conditional Divergence) is built, which can be seen as an adapta¬ 
tion of the Expected Improvement criterion proposed in Jones et al. |I7] . Let 
us notice that the expected divergence criterion proposed in the next section, 
although close to a Stepwise Uncertainty Reduction (SUR) criterion, does not 
derive from the SUR formulation of Vazquez and Beet [H], Beet et al. [ 3 . The 
latter would lead to a more challenging approach from a computational perspec¬ 
tive in our context. 

4.1 Principle 

Ideally, the posterior distribution of the parameters 6 = (to, C) after adding 
a new point ^(Ar+i) to the current DoE Dn should be as close as possible to 
the posterior distribution knowing the original function iJ, i.e. a relevant dis¬ 
crepancy measure between the two relative distributions must be minimized. 
Based on information-theoretical arguments given in Cover and Thomas m, 
the Kullback-Leibler (KL) divergence 

KL(^(0|y, d, H) |K(0|y, d, U {il(2)}))) , (11) 

is a good choice of discrepancy measure. Remind that given two densities p{x) 
and q{x) defined over the same space A, 

KL(p||(3') = f p{x)log'^^dx. 

Jx 




Ideally, the next point Z(jv+i) should be searched within the feasible region 
as the global minimum of this divergence. But obviously, the unknown term 
7r(0|y, d, H) makes this formulation intractable. But a tractable sub-optimal 
criterion can be heuristically derived from it by the following rationale. It must 
be noticed that 


^{N+l) 


argmin Kl( 7r(6»|y,d,i?) 117r(6»|y,d,U {i/(z)})), 

zGO 


argmin KL(|7r(6»|y, d, i?) 117r(6»|y, d, Hr,„ U {i/(z)})j 
-KL(7r(0|y,d,iJ)||7r(0|y,d,H,,„)), 

argmax / tt 0 y, d, H) log-——— -^- dO. 

zen Jeen 7r(0|y, d, Hr,„) 


The intractable target density 7r(0|y, d, H) has to be replaced with its best avail¬ 
able approximation, which is 7r(0|y, d, U {H{z)}). Under the kriging as¬ 
sumptions, for any z this distribution is closer of 7r(0|y, d, H) than 7r(0|y, d, 
Therefore, a sub-optimal version of the idealistic criterion is: 

Z(7V+i) = argmax KL(7r(0|y,d,H£,„ U {i/(z)}) II 7r(0|y,d,Hi,„)). 

In other words, the chosen strategy aims at finding the optimal point Z(jv+i) 
which modifies the actual distribution 7r(0|y, d, as much as possible in an 

information-theoretic sense. First proposed by Stein as a loss function, the 
dissymetric KL divergence between the two consecutive posterior distributions, 
which is invariant under one-to-one transformation of the random vector 0, has 
an operative interpretation as the loss of information (in natural information 
units or nits) which may be expected by choosing the baddest approximation 
40 |y,d,H^„) instead of the best (available) 7r(0|y, d, U {H{z)}) [T^ IB]. 


The preceding formulation is not satisfactory yet, since one evaluation of 
the criterion requires one evaluation of H, which is time-consuming. However, 
in the spirit of EGO, it is possible to derive a new criterion considering the 
following Gaussian process based on the available observations instead of 

H: 


hN{z) := H(z)|H^„, (12) 

which follows the normal distribution given in 0. Thus, we define the expected 
divergence criterion'. 

Z(N+i) = argmaxE,,(;,„) [KL(7r(0|y,d,Hr,„ U{/iAr(z)}) (13) 

||7r(0|y,d,HB,,))]. 

The idea of considering the Gaussian variable hN{z) rather than the predictor 
Hn(z) allows to account for the uncertainty introduced by the kriging method¬ 
ology, while it requires usual Monte Garlo methods to approximate the double 
integrals, i.e. the expectation and the KL divergence. 

Even if no run of H is required, the evaluation of this expected divergence 
criterion requires many calculations. In the next section, a heuristic is proposed 
to shrink the computational cost of the approach. 
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4.2 The Expected Conditional Divergence heuristic 


Preliminary experiments showed that the criterion defined in (13) is generally 
too expensive to be useful, except for extremely CPU-consuming code H. The 
main reason is that any test of a new point z requires to run a Gibbs sam¬ 
pler. Therefore a last adaptation of the criterion is proposed: the Expected 
Conditional Divergence (ECD) criterion depends only on the intermediate full- 
conditional posterior distributions of 9. More precisely, at the (r-l-l)-th iteration 
of the Metropolis-Hastings-within-Gibbs algorithm, the strategy is defined as: 


with 


Z(Ar+i) = argmaxECD(z) 


ECD(z) 






(14) 


(15) 


where and X’^'’+^)(z) denote the missing data samples simulated from 


X(^+i)(z) ^ ^(•|y,d,0(’'+i),Hz,,,U{/i^(z)}). 

It is worth noting that in the ECD criterion, the final posterior distribution 
of 9 is replaced by its sequential conditional posterior distribution at the (r -|- 
l)-th iteration. At the (r -|- l)-th iteration of the Gibbs sampling, given a 
candidate z to enrich the DoE, this heuristic enables to compute a value ECD(z) 
which is likely a sufficient approximation of the expected divergence criterion 
for the global algorithm to perform well. Moreover, once ECD(z) has been 
evaluated, the computation of ECD(z') at a new candidate z' takes benefit 
of the computations performed during the calculation of ECD(z) (sampling of 
XC+i) by Metropolis-Hastings, then sampling of 9 given X(’'+^)) and does not 
require a full Gibbs sampling anymore (just the MCMC sampling of X*^’'+^)(z), 
then the sampling of 9 given X*^’"+^^(z)). Hence it allows an exploration of the 
input space (optimization of ECD) for a acceptable CPU-cost. 

Finally, using a standard Monte-Carlo estimator to estimate the expectation 
of the KL divergence according to 7r(/ijv) (see (HH), the ECD heuristic algorithm 
proceeds as follows: 


ECD strategy - 

Given an initial design Dm with the corresponding evaluations 

Hdjv °f H-. 

1. r := 0. 

2. Perform k new Gibbs iterations (Section [3.2^ ; r := r + k: this gives 

g(r+l) ^ 

3. Sample from tt (jy, d, Hdjv) (see Appendix A). 

4. Sample T = {0i, . . . , } from 7r(-|X^^'’'^\y, d) (explicit distribution: see 

steps 1 and 2 of Section 

5. Get a new point Z(n+i) fo enrich the DoE by the optimization of ECD (simulated 
annealing, see Appendix C): for any z, assess ECD(z) if needed by: 
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(a) Generate M samples (h^ (z), . . . , (z)) according to (121 and build 


M corresponding emulators 




with H'j^^i(z) based 


on the dataset H£)jYU{/i]y(t)} (no re-estimation of the covariance 
function parameters tp, see Section 


(b) for 1 < i < M , 

(i) Sample from 7r(-|y, d, (t)) (see Appendix A). 

(ii) Sample = { 6 \,.. . , 9 ]^^} with 6 = {mi, ... ,mq,Cii,. . . ,Cgq) from 
7r(-| X(’'^^^’'( 2 ), y, d) (explicit distribution: see steps 1 and 2 
of Section [Ol ). 

(c) ECD{z) := yf X/f-i 11 where KL(.||.) denotes the KL divergence 

estimate (see Appendix B). 


6. Dn ■= Dn 'J {zn+i} and Hd„ := Hdjv U {H(tjv+i)} (new run of H) . 

7. Return to 2 if t^Hdjv less than the maximal number of runs of H. 


In our numerical experiments, the optimization (step 5) and the KL diver¬ 
gence estimation (step 5.(c)) are respectively performed using the simulated 
annealing (SA) method [IS] and the Nearest-Neighbor (NN) method of Wang 
et al. (see Appendices C and B for detail): other choices are possible. 

Let us remark that it can be reasonable to decrease the CPU-cost of ECD by 
neglecting the dependencies between the components of 0: eventually, assuming 
that these components are independent substantially decreases the cost of the 
k-NN KL divergence estimation, since the multivariate KL divergence is then 
the sum of univariate KL divergences. It would be also feasible to suppose that 
9 is made up with independent random vectors (e.g. assuming independence 
between m and C). In fact, this technique could be directly applied to the ex¬ 
pected divergence criterion (previous section), thus offers an alternative to ECD. 
However, it is not investigated hereafter, because ECD alone leads to a satis¬ 
factory trade-off between efficiency of the DoE enrichment and computational 
cost, in our industrial context. 


5 The Weighted-IMSE criterion for adaptive de¬ 
signs 

This section is devoted to propose an alternative criterion of adaptive design, by 
adapting the popular weighted-IMSE criterion 1311 HE], reminded hereinafter, to 
the Bayesian context of probabilistic inversion. 

5.1 The Integrated MSE criterion 

The Integrated Mean Square Error (IMSE) criterion |3I| is a measure of the 
average accuracy of the kriging metamodel over the domain U: 

IMSE(U) = [ MSE{z)dz, 

Jq 
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where MSE(z) is defined in the Gibbs sampler in § 3.2 Given a current design 
Dpf of N points, Picheny et al. proposed the following WIMSE criterion as 
an alternative approach to improve the prediction accuracy in regions of main 
interest: 


WIMSE(z*) = [ MSE{z\DnU{z*})w{z\Dn,IId„) dz, (16) 

Jn 

where MSE {z\Dn U {z*}) denotes the prediction variance by adding the point 
z* = {x*,d*) into Dn and w (zjZlAr, Hd„) is a weight function emphasizing 
the MSE term over these regions of interest. The calculation of MSE does not 
depend on the expensive evaluation H(z*) and the weight factor w only depends 
on the available observations . The next point to add to the DoE is thus 
defined by 

Z(Ar+i) = argmin WIMSE(z). 


5.2 Adaptation to the Bayesian inversion context 

Defining the regions of interest is the essential task in applying the WIMSE 
criterion. As presented in previous sections, a probabilistic solution to inverse 
problems is to approximate the posterior distribution of the parameters 9 = 
{m,C) using a Metropolis-Hastings-within-Gibbs algorithm (cf. Section 3.21. 
Assuming that the {N + 1)—th new point is added at the (r + 1)—th iteration 
of the Gibbs sampling, the weight function is defined by the following formula: 


w {z\Dn,IId,^) 


oc 


oc 


J] 7r(x,d|y„ 0(^+1), He,), 

2=1 


n 

|R + MSE(a;, d )|“2 .exp 

i=l 



(17) 


where 


-1 


Ai = (a; — (x — 

- {yi- H{x,d)^ + MSE(x, d)) {yi- H{x,d)^, 


which is derived from the full conditional posterior distribution of X described 
in Section |3.2[ It can be considered as a measure of the posterior prediction 
error. The advantage of this choice is twofold. First, this weight function lo 
indicates a potential position for the missing-data X where the accuracy of the 
metamodel should be improved. Second, this weight function depends on the 
observation sample y = {yi,..., 2/™}) coherently with the Bayesian conditioning 
process and providing a purpose-oriented sense to the design. 


Besides, since the two terms MSE(- • •) and w( - ■ ■) of (161 are different in 
nature, a tuning parameter a is introduced (as an exponent) to allow for a trade¬ 
off between the two. Therefore the following version of the WIMSE criterion is 
proposed: 


WIMSE(z*) = MSE°‘ {z\Dn U {z*}) w^-°‘{z\Dn,'H.d„) dz. (18) 
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In this equation, a varying between 0 and 1 makes the criterion more flexible: 
if a is close to 1, the impact of the weight parameter w disappears and the 
criterion becomes IMSE; if a approaches to 0, the prediction error MSE will 
not be accounted for. Experimental trails proved that the choice of a is critical. 
Furthermore, such a chosen weight function w, defined as the product of n 
possible small densities, may cause numerical (underflow) problems. Replacing 
w^~°‘ by the probability density function w^~°‘ j f as suggested in Picheny 

et al. |2H], can solve such difficulties. In practice, a Monte Carlo method must 
be used to estimate the normalizing constant. 

For a DoE of dimension one or two, a Cartesian grid over the design space fl 
can be used to solve the numerical integration and optimization problems [28j . 
In more general cases of higher dimension, stochastic integration and global op¬ 
timization techniques should be preferred, e.g. Monte Carlo methods and SA 
algorithms (Appendix C). 


6 Numerical experiments 

In this section, numerical studies are conducted on a manageable example to 
assess the performances of both adaptive kriging strategies. The performances 
of the WIMSE and ECD criteria are compared with the standard maximm-LHD 
and the simple MMSE (maximum MSE) criterion, defined by 

Z(N+i) = argmin maxMSE (zlD^v U {z*}), 

zGO 

under the same evaluation budget. A good kriging metamodel has been built 
using a large DoE for playing a benchmark role. 


Consider the parametric function previously used in Bastos and O’Hagan 

0: 


H(xi,X2) 


/ 1 \\ /2300xf-k 1900a;?-k 2092X1-k 60 V 

lOOx?-k 500x? -k 4xi-k 20 V 


with Xi € [0,l],i = 1,2. In the experimental trials, the design domain fl = 
[0,1]^. The dataset Y = (Yi,i = 1,...,30) of size n = 30 is simulated from 
the uncertainty model ( [T^ where the missing data is generated with the 
following Gaussian distribution, truncated in domain fl: 





( 20 ) 


and the error term Ui is the realization of a A/i(0,10 random variable. More¬ 
over, in and the hyperparameters are chosen as follows: a = 1, v = b, 
fj. = (0,0) and 


A = 2 


0.182 

0 


0 

O.V 


In practice, the burn-in period of the MCMC algorithm can be verified by 
the Brooks-Gelman diagnostic Rbg of convergence [3]. It was calculated every 
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50 iterations and the convergence was not accepted until Rbg < 1-05 for at 
least 3,000 successive iterations. 

The main features of the generated DoEs are summarized on Table All 
initial DoEs consist of the same five points produced by maxjmm-LHD, and 
then are completed by five other points selected by the criteria. Table [^displays 
the value of parameters involved in carrying out the two criteria and the SA 
algorithm. 

Figure provides a comparison of all designs with the standard 10-points- 
maximin-LSD (encompassing the initial DoE). For the W-IMSE criterion, the 
added points are found not far from the hypothesized mean (0.5,0.7) and the 
four WIMSE designs are quite similar. However, the posterior distributions of 
9 are quite sensitive to the choice of a. Figure displays these posterior dis¬ 
tributions for the corresponding metamodels. The WIMSE criterion improved 
the posterior distributions of and C 22 , but the choices a = 1,0.5 and 0.2 
do not work well for the posterior distribution of toi and Cn. It can be seen 
that the lO-points-maajfmm-LHD performs poorly, with respect to a 5-points- 
maximin-LiHD sequentially completed. Moreover, the MMSE criterion performs 
correctly. However, other experiments, conducted using the best value a = 0.8 
for the WIMSE criterion, are summarized on Figure]^ These results highlight, 
on this example, that the design build using the ECD criterion can significantly 
outperform the 10-point-maa:zmm-LHD, can perform more efficiently than the 
MMSE criterion and can do as well as the WIMSE criterion. 


7 Case-study: calibrating roughness coefficients 
of an hydraulic engineering model 

The case-study that motivated this work is the calibration, from observed wa¬ 
ter levels Y and upstream flow values d, of the roughness (so-called Strickler) 
coefficient X of the hydraulic computer model TELEMAC-2D. This software 
tool is considered as one of the major standards in the field of free-surface flow 
by solving shallow water (Saint-Venant) equations [Uj. This parameter vec¬ 
tor summarizes the influence of the land nature on the water level, for a given 
discharge d. The model is used here to reproduce in two dimensions (geograph¬ 
ical coordinates) the downstream water level of the French river La Garonne 
between Tonneins and La Reole (Figure]^. 

The flow simulation of this 50km river section, including riverbed and flood- 
plain (cf. Figure]^, is conducted on very fine meshes defined by 41,000 knots, 
each parametrized by a roughness value. The dimension of X is diminished to 
(7 = 4 by taking account of: (a) the homogeneity of the land regularity in large 
areas surrounding the riverbed between four measuring stations (Table and 
Figure ; and : (b) the lack of observations of floodplain water levels at the 
uppermost subsection, which requires to fix the corresponding roughness coef¬ 
ficient. Details about the notation and meaning of each component of X are 
provided in Table 

The strong but physically limited uncertainty that penalizes the knowledge 
of Strickler coefficients is compatible, according to Wohl [IB], with simple and 
classic statistical distributions as the Gaussian law (numerically truncated in 
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0). Based on available bibliography summarized in Table and after discussing 
with ground experts, values for the hyperparameter /r for each dimension of X 
were simple to elicit (see Table . It was more tricky to find information about 
the correlations between the X. The strong differences of land nature between 
the riverbed and the foodplain made plausible the assumption of independence 
between the corresponding components of X. On the contrary, it is likely that 
two connected riverbed section share roughness features. However, in absence 
of any additional information about these possibe correlations, Ce was chosen 
diagonal: 


/ 0 0 0 \ 

r = 0 0 0 

\ 0 0 0 '^minA^ / 

The calibration of each a was conducted by using marginal prior knowledge 
about the mean variation of the Manning coefficient M = 1/X, discussed in 
Liu m and displayed on Figure]^ A prior Manning estimator (M = 1/a m) 
can then be produced. A magnitude for the corresponding prior estimator of a 
(for the Strickler X = 1/M) can be derived assuming that the results on Table 
[^and Figure [^summarize a large number of past estimations. Further to this 
assumption, a crude in-law convergence 

a(C/(M-M) A Af(0,l). 

associated to a Delta method provides the approximate result 




A/'(0,1), 


and finally tr^ ~ The prior assessments of these variances are provided 

on Table assuming a virtual size a = 1 for each dimension (see § 3.1 for 
details). 

The relevance of a metamodelling approach was acknowledged since each run 
of TELEMAC-2D can take several hours. Maximin-LHD designs were produced 
over the domain fl, defined for the input vector z = {x,d) as 


H — f^maj ^ ^miiiTA ^ ^minAA ^ ^minAz. ^ 

in function of the bounds of variation domains summarized in Table Hmaj = 
[0,30] and flmiiiTA = ^minAA = f^minAi, = [20,70] (in (m^/^.s“^)). The domain 
fid was chosen as [ 90 . 05190 . 95 ] = [510,2373] where Qa is the a—order percentile 
of the known flow distribution, which is Gumbel with mode 1013 and 

scale parameter 458. 


Before running TELEMAC-2D, however, a Bayesian inferential study was 
briefly conducted using the MASCARET simplified computer code [T^, which 
describes a river by a curvilinear abscissa and uses the same input vector. While 
much more imprecise than TELEMAC-2D, the advantage of this simplified 
model is that the CPU time used for one run is shorter, so that the MCMC 
proposed in m can be conducted in due time using a static Maximin-UID 
design (and metamodelling calibrated once), using 20,000 iterations. The aim 


15 





of this study was to test the agreement between the prior assessments and the 
observations, following recommendations in min]. A set of n = 50 observations 
were available, among which the 10 most recent were preferentially selected, as 
the most representative of the actual conditions (riverbed homogeneity). For 
several sizes of design and the two datasets the marginal posterior distributions 
are displayed on Figure For each dimension, it appears that the regions of 
highest posterior density are in accordance with the prior guesses, which makes 
us confident in the relevance of the prior elicitation process. 

Based on this good relevance of the Bayesian model, a comparison of the 
three designs considered in this article was conducted by comparing the emulator 
errors yielded by the designs, using the coefficient of predictability Q 2 ■ A cross- 
validation leave-one-out version of this criterion is used here for computational 
simplicity m- 

^ . PRESS 

^2 1 _ _ _ AT I I _ I I 2 * 

- hdJ 

where Hd^ = jf Hiz(^^)) and PRESS = 4) = “ 

, with 

• e(i) is the prediction error at of a fitted model without the point Z(i); 

• is the approximation of H at derived from all the points of 
the design except Z(i). 

The closer Q 2 to 1, the smaller the variance explained by the emulator 
and the better the quality of the design (in terms of prediction power for the 
metamodel). Four designs are tested. Two Maximin-LHD designs D 20 and £>500 
of 20 and 500 points, respectively (the second one playing the role of a ’’reference 
design” leading to a very good approximation of the posterior distribution. Two 
other designs are sequentially elaborated using the ECD and WISE criterion, 
starting from an initial design Dio of 10 points: 10 other points are added. 

Displayed on Figure]^ the Q 2 coefficient related to the maximin-LHD D 20 
equals 0.9745 and the benchmark <52 corresponding to the D 500 equals 0.9933. 
Starting from a design of 10 points only, it appears natural that other designs 
are characterized by a lower Q 2 . However, by adding 10 points iteratively to 
the initial design Dio according to the two proposed criteria, an increasing 
value of Q 2 is obtained, which quickly beats the predictability generated by the 
maximin-LiHD D 20 . Finally using the ECD criterion provides a slightly better 
Q 2 value than using the WISE criterion. 

Coming back to the TELEMAC-2D computer code, the convergence of 
MCMC chains were obtained (using the n = 10 best observations) after 30,000 
iterations. For the various designs proposed in this article, the marginal pos¬ 
terior distributions of the four first parameters are displayed on Figure The 
Maximin-ljHD design D 20 (producing the approximate posterior in red) was 
made of 40 points, while other situations start from a DOE of 20 initial points, 
to which 20 other points are added sequentially (producing the approximate 
posteriors in blue and black). The reference Maximin-UID design D 500 (pro¬ 
ducing the best approximation of the target posterior, in green) is made of 500 
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points, as for the MASCARET application. A better proximity of the approx¬ 
imate posterior distribution produced using ECD to the target can be again 
noticed with respect to the approximation produced by the WIMSE approach. 


8 Conclusions and perspectives 

This article aims to provide an adaptive methodology to calibrate, in a Bayesian 
framework, the distribution of unknown inputs of a nonlinear, time-consuming 
numerical model from observed outputs. This methodology is based on improv¬ 
ing a space-filling design of experiments, typically the maxzmm-Latin Hypercube 
Design, that offers a non-intrusive exploration of the model. Kriging metamod¬ 
elling is used to avoid costly runs of the model. 

In this methodology, two adaptive criteria have been proposed to complete 
sequentially the current design. The first one is an adaptation of the standard 
Weighted-IMSE criterion to the Bayesian framework. It is obtained by weighting 
the MSE term over a region of interest indicated by the current full conditional 
posterior distribution. The other criterion, called Expected CD, is based on 
maximizing the Kullback-Leibler (KL) divergence between two consecutive ap¬ 
proximate posterior distributions related to the DoE. A clearer interpretation 
can be given to the second criterion, as a crude approximation of the nega¬ 
tive KL divergence between the target posterior and the current approximate 
posterior distributions. 

Numerical experiments have highlighted, on two examples, that applying this 
adaptive procedure can reduce the prediction error and improve the accuracy of 
the metamodeling approximation, compared with a standard space-filling DoE. 
Therefore such adaptive procedures appear to be useful when the CPU time 
required to compute an occurrence of the simulator H of physical models is 
dramatically greater than the time required to run a Gibbs sampler, a Monte 
Carlo integration or to perform an optimization with a Simulated Annealing 
procedure. 

Both criteria involve expensive numerical integration. For a similar gain in 
information, the ECD criterion appears to be a little more expensive than the 
WIMSE criterion since it requires the calculation of the empirical KL diver¬ 
gence. However, in the definition of WIMSE, the choice of a is quite important. 
As the second weight function is globally much smaller that the first predic¬ 
tion error, this balance parameter permits us to find a good behavior of this 
criterion. In this article, this important parameter was not systematically stud¬ 
ied, but the computation of the best (or at least a ’’good”) value of a makes 
the use of WIMSE much less easy. In addition of this better interpretation (in 
information-theoretic terms), this feature lets us have a clear preference for the 
use of the ECD criterion. 

This work is a first approach to designing sequential strategies for both ex¬ 
ploring a black-box, time-consuming computer code and in parallel calibrating 
some of its unobserved random inputs. The democratized use of metamodelling 
requires, in practice, to make various approximations. For instance, it is cur¬ 
rent that the hyper-parameters of kriging metamodels are updated (e.g., by 
maximum likelihood estimation) after several additions of points to an origi¬ 
nal design, since each updating (which should be formally conducted after each 
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addition of a new point) can be a costly operation itself without fundamental 
improvement [35113S] • Following a same idea of reaching a trade-off between a 
theoretical aim and practical easiness, idealistic criteria are often necessarily ap¬ 
proximated, or favored partially because their computation can made explicit. 
This is for instance the case of the Expected Improvement (El) criterion pro¬ 
posed by Jones et al. m which makes profit from the Gaussian properties of 
kriging metamodels. 

Such approximations appeared needed to conduct this first study and high¬ 
light the interest of the approach. The rationale developed in Section j^must now 
be followed by a truly theoretical work that could robustify the proposed choices, 
accompanied with more systematical simulation studies with other static or dy¬ 
namic designs of numerical experiments. Especially, the statistical control of 
the metamodelling-based posterior approximation with respect to the target 
posterior should be a focus point in future studies, by making profit of the rela¬ 
tionships between Kullback-Leibler divergences and discrepancy measures m 
as well as recent theoretical developments about relaxing assumptions under 
which metamodelling provides a fair approximation of the real numerical model 
(e.g., Vazquez and Beet |42j 1. Such works are currently being conducted. For 
the present time, it must be noticed that the approximate posterior distribution 
produced by the ECD approach can be considered as a fast non-intrusive way of 
modelling an instrumental distribution, to be used in a final step of importance 
sampling (typically to compute a posterior mean), provided a small computa¬ 
tional budget be kept or made available for running the numerical model. 


Acknowledgments 

The authors gratefully thank Gilles Geleux (INRIA) for many fruitful discussions 
and advices. This work was partially supported by the French Ministry of 
Economy in the context of the GSDL {Complex Systems Design Lab) project of 
the Business Gluster System@tic Paris-Region. 


References 

[1] Bachoc, F. (2013). Gross validation and maximum likelihood estimation of 
hyper-parameters of gaussian processes with model misspecification. Compu¬ 
tational Statistics and Data Analysis, 66:55-69. 

[2] Barbillon, P., Geleux, G., Grimaud, A., Lefebvre, Y., and i£ ■ tienne de Roc- 
quigny (2011). Nonlinear methods for inverse statistical problems. Compu¬ 
tational Statistics & Data Analysis, 55(1): 132 - 142. 

[3] Bastos, L. S. and O’Hagan, A. (2009). Diagnostics for Gaussian process 
emulators. Technometrics, 51(4):425-438. 

[4] Bayarri, M. J., Berger, J. O., Paulo, R., Sacks, J., Gafeo, J. A., Gavendish, 
J., Lin, G.-H., and Tu, J. (2007). A framework for validation of computer 
models. Technometrics, 49(2):138-154. 

[5] Beet, J., Ginsbourger, D., Li, L., Picheny, V., and Vazquez, E. (2012). 


18 


Sequential design of computer experiments for the estimation of a probability 
of failure. Statistics and Computing, 22(3):773-793. 

[6] Berger, J., Bernardo, J., and Sun, D. (2015). Overall objective priors. 
Bayesian Analysis, 10:189-221. 

[7] Bousquet, N. (2008). Diagnostics of prior-data agreement in applied bayesian 
analysis. Journal of Applied Statistics, 35:1011-1029. 

[8] Bousquet, N., Fouladirad, M., Grail, A., and Paroissin, C. (2015). Bayesian 
gamma processes for optimizing condition-based maintenance under uncer¬ 
tainty. Applied Stochastic Models in Business and Industry, 31(3):360-379. 

[9] Brooks, S. P. and Gelman, A. (1998). General methods for monitoring 
convergence of iterative simulations. Journal of Computational and Craphical 
Statistics, 7(4):pp. 434-455. 

[10] Geleux, G. and Diebolt, J. (1988). A random imputation principle: the 
stochastic em algorithm. Technical Report RR-0901, INRIA. 

[11] Geleux, G., Grimaud, A., Lefebvre, Y., and de Rocquigny, E. (2010). Iden¬ 
tifying intrinsic variability in multivariate systems through linearized inverse 
methods. Inverse Problems in Science and Engineering, 18(3):401-415. 

[12] Gover, T. M. and Thomas, J. A. (2006). Elements of Information Theory. 
Wiley-Interscience, 2nd edition edition. 

[13] Fu, S., Geleux, G., Bousquet, N., and Gouplet, M. (2014). Bayesian in¬ 
ference for inverse problems occurring in uncertainty analysis. International 
Journal for Uncertainty Quantification. Forthcoming article. 

[14] Galland, J., Goutal, N., and Hervouet, J. (1991). Telemac: A new numeri¬ 
cal model for solving shallow water equations. Advances in Water Resources, 
14(3):138-148. 

[15] Goutal, N., Lacombe, J.-M., Zaoui, F., and El-Kadi-Abderrezzak, K. 
(2012). Mascaret: a 1-d open-source software for flow hydrodynamic and 
water quality in open channel networks. River Flow, pages 1169-1174. 

[16] Johnson, M. E., Moore, L. M., and Ylvisaker, D. (1990). Minimax and max- 
imin distance design. Journal of Statistical Planning and Inference, 26(2):131- 
148. 

[17] Jones, D. R., Schonlau, M., and Welch, W. J. (1998). Efficient global op¬ 
timization of expensive black-box functions. Journal of Clobal Optimization, 
13:455-492. 

[18] Kennedy, M. G. and O’Hagan, A. (2001). Bayesian calibration of com¬ 
puter models. Journal of the Royal Statistical Society: Series B (Statistical 
Methodology), 63(3):425-464. 

[19] Kirkpatrick, S., Gelatt, G. D., and Vecchi, M. P. (1983). Optimization by 
simulated annealing. Science, 220(4598):671-680. 


19 



[20] Koehler, J. and Owen, A. (1996). Computer experiments. In Ghosh, S. and 
Rao, C., editors, Design and analysis of experiments, volume 13 of Handbook 
of statistics, pages 261-308. Elsevier. 

[21] Liu, C. and Rubin, D. B. (1994). The ecme algorithm: A simple extension 
of em and ecm with faster monotone convergence. Biometrika, 81(4):633-648. 

[22] Liu, D. (2009). Uncertainty Quantification with Shallow Water Equations. 
PhD thesis, Ph. Dissertation in Natural Risk Management, Carl-Friedrich- 
Gauss Faculty, University of Braunschweig. 

[23] Matheron, G. (1971). The theory of regionalized variables and its applica¬ 
tions. Les Cahiers du Centre de Morphologie Mathematique de Fontainebleau, 
Fascicule 5. Ecole des Mines de Paris. 

[24] McKay, M. D., Beckman, R. J., and Conover, W. J. (1979). A comparison 
of three methods for selecting values of input variables in the analysis of 
output from a computer code. Technometrics, 21:239-245. 

[25] Mockus, J., Tiesis, V., and Zilinskas, A. (1978). The application of bayesian 
methods for seeking the extremum. In Dixon, L. and Szego, G., editors. Global 
Optimization, volume 2, pages 117-129. North Holland, New York. 

[26] Morris, M. and Mitchell, T. (1995). Exploratory designs for computationnal 
experiments. Journal of Statistical Planning and Inference, 43:381-402. 

[27] of Engineers, U. A. C. (1996). Risk-analysis for flood damage reduction 
studies. Technical Report No. EM 1110-2-1619. 

[28] Picheny, V., Ginsbourger, D., Roustant, O., Hafka, R., and Kim, N.-H. 
(2010). Adaptive designs of experiments for accurate approximation of a 
target region. Journal of Mechanical Design, 132(7). 

[29] Pollard, D. (2013). Asymptotia: An Exposition of Statistical Asymptotic 
Theory. On line books. 

[30] Rasmussen, C. E. and Williams, C. K. 1. (2006). Gaussian Processes for 
Machine Learning. MIT Press. ISBN-13 978-0-262-18253-9. 

[31] Sacks, J., Welch, W., Mitchell, T., and Wynn, H. (1989). Design and 
analysis of computer experiments. Statistical Science, 4(4):409-435. 

[32] Santner, T. J., Williams, B. J., and Notz, W. 1. (2003). The design and 
analysis of computer experiments. Springer Series in Statistics. Springer. 

[33] Sellin, R., Keast, J., and Beeston, D. V. (1997). Seasonal variation in river 
channel hydraulic roughness. In Proceedings of the 21 lAHR World Gongress 
’’Water for a Ghanging Global Gommunity” - Theme B: Environmental and 
Goastal Hydraulics: Protecting the Aquatic Habitat, pages 1390-1396. 

[34] Stein, C. (1964). Inadmissibility of the usual estimator for the variance 
of a normal distribution with unknown means. Annals of the Institute of 
Mathematical Statistics, 16:155-160. 


20 



[35] Stein, M. L. (1999). Interpolation of Spatial Data - Some Theory for Krig- 
ing. Springer Series in Statistics. Springer-Verlag New York. 

[36] Survey, U. G. (1989). Guide for selecting Manning’s roughness coefficients 
for natural channels and flood plains. U.S. Geologogical Survey Water Supply, 
paper 2339. 

[37] Tierney, L. (1996). Introduction to general state-space markov chain theory. 
In Gilks, W. R., Richardson, S., and Spiegelhalter, D. J., editors, Markov 
Chain Monte Carlo in Practice, Interdisciplinary Statistics, chapter 4, pages 
59-74. Ghapman & Hall/GRG. 

[38] Toal, D. J., Bressloff, N. W., and Keane, A. J. (2008). Kriging hyperpa¬ 
rameter tuning strategies. The American Institute of Aeronautics and Astro¬ 
nautics (AIAA) Journal, 46:1240-1252. 

[39] Toal, D. J., Bressloff, N. W., Keane, A. J., and Holden, G. (2011). The de¬ 
velopment of a hybridized particle swarm for kriging hyperparameter tuning. 
Engineering Optimization, 43:1-28. 

[40] Vanderpoorten, A. and Palm, R. (2001). Gompared regression methods 
for inferring ammonium nitrogen concentrations in running freshwaters from 
aquatic bryophyte assemblages. Hydrobiologia, 452(1-3): 181-190. 

[41] Vazquez, E. and Beet, J. (2009). A sequential bayesian algorithm to esti¬ 
mate a probability of failure. In 15th IFAC Symposium on System Identifica¬ 
tion. 

[42] Vazquez, E. and Beet, J. (2011). Sequential search based on kriging: con¬ 
vergence analysis of some algorithms. ISI ? 58th World Statistics Congress 
of the International Statistical Institute (ISI911), Dublin, Ireland, August 
21-26:1. 

[43] Viollet, P.-L., Ghabard, J.-P., Esposito, P., and Laurence, D. (1998). 
Mecanique des Fluides Appliquee. Presses de I’Ecole Nationale des Ponts 
et Ghaussees. 

[44] Walesh, S. (1989). Urban Water Surface Management. John Wiley and 
Sons. 

[45] Wang, Q., Kulkarni, S. R., and Verdii, S. (2009). Divergence estimation 
for multidimensional densities via k-nearest-neighbor distances. IEEE Trans¬ 
actions on Information Theory, 55(5):2392-2405. 

[46] Wohl, E. (1998). Uncertainty in flood estimates associated with roughness 
coefficient. Journal of Hydraulic Engineering, 124(2):219-223. 


21 



Appendix A. Metropolis-Hastings step within the Gibbs 
sampler 

At step r + 1 of Gibbs sampling, after simulating the missing 

data can be updated with a Metropolis-Hasting (MH) algorithm. The 

MH step is updating = (A[,..., X;)' in the following way: 

• For i = 1,..., n 

1. Generate Xi ~ J(- | XI) where J is the proposal distribution. 

2. Let 


a{X ,Xi) = min — - --,1 , 

Wg(XW \y,6i^+^'),p,d,HD)J{X,\Xl) ^ 

where 


X = 

(x[+\ 

Y' 

• • • , ^i-l 5 ^2+15 

...,x;)' 

XM = 

(x[,.. 

• , ^2-1? ^2 5 ^2+1? ■ ■ 


'r-|-l _ 

/ 

with probability 

a(X[,X,), 

i 

1 

otherwise. 



Remarks 


Many choices are possible for the proposal distribution J. It appears that 
choosing an independent MH sampler with J chosen to be the normal 
distribution J\f{m give satisfying results for the model |ll. 


• In practice, it can be beneficial to choose the order of the updates by 
a random permutation of {1 ,..., n} to accelerate the convergence of the 
Markov chain to its limit distribution. 


Appendix B. Nearest-Neighbor approach 

The Kullback-Leibler (KL) divergence between samples 0* and T can be em¬ 
pirically calculated through the Nearest-Neighbor approach. 




d 1 ^L2{dj) 


log 




Li-1’ 


( 21 ) 


where d denotes the dimension of the parameter 9 {2q in our case), 
denotes the (Euclidean) distance between 0* G 0* and its nearest neighbor in 
sample 'k 


’^L.,{0)) = min \\9r-9)\\2, 
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and denotes the (Euclidean) distance of 9j to its nearest neighbor in 0® 

except itself (as it is also included in 0®) 




min \\di - 


It has been proved in |3^ that under some regularity conditions on the 
samples 0® and dr, the estimator II 'f') is consistent in the sense that 

hm Efia.L,.L,(0N|^)-KL(0®||^))" = 0, (22) 

Li,L2—^OC> \ / 

and asymptotically unbiased, i.e. 


lim 

L.R—^oo 


E 


KLi,,L,(0®||d') 


KL(0® lid-). 


(23) 


Appendix C. Simulated Annealing algorithm (searching for 
the minimum of a function /) 

Proposed by Kirkpatrick et al. m, the SA algorithm is a stochastic optimization 
algorithm. 

Given the current point , at iteration fc + 1 : 


1. Generate z ^ Af a'^j , with a certain fixed variance tr^. 

2. Let 

where (3k+i is the current temperature at step k + 1. 

3. Accept 


zlk+i] 



with probability 



otherwise. 


4. Update Pk+i = 0.99 x Pk . 
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DoE 1 


lO-point-mosimin-LHD 


DoE 2 S-points-mosimin-LHD + 5-points-WIMSE 

or 5-points-ECD 
or 5-points-MMSE 

DoE 3 lOO-points-masimin-LHD {benchmark) 


Table 1: Description of the three types of designs of experiments (DOE) (two- 
dimensional toy example). 


WIMSE 

a 

Number L of iterations 

Size M of the 



of the SA algorithm 

Monte Carlo algorithm 


1, 0.8, 0.5, 0.2 

1,000 

1,000 

ECD 

Number M of 

Sizes Li and L 2 of 

Number L of iterations 


generated GPs 

the samples 0* and 'I' 

of the SA algorithm 


100 

1,000 

1,000 

SA algorithm 

Initial point 

Initial temperature /3 

Standard deviation a 


X 

100 

100 


Table 2: Choice of parameters for the design criteria computation and the SA 
algorithm (two-dimensional toy example). 
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Nature of surface 

Value of Strickler coefficient {m}^^ ■ s 

Riverbed 

Smooth concrete 

75-90 

Earthen channel 

50-60 

Plain river, without shrub vegetation 

35-40 

Plain river, with shrub vegetation 

30 

Slow winding natural river 

30-50 

Very cluttered riverbed 

10-30 

Proliferating algae 

3.3-12.5 

Foodplain 

Meadows, uncultivated fields 

20 

Cultivated lands with low size vegetation 

15-20 - 18 

Cultivated lands with large size vegetation 

10-15 - 13 

Bush and undergrowth areas 

8-12 - 10 

Forest 

<10 

Low density urban sprawl 

8-10 

High density urban sprawl 

5-8 


Table 3: Realistic ranges of value for the Strickler coefficient in function of 
the nature of the surface, summarized from Survey [35], Walesh [33], Sellin 
et al. [33] and Viollet et al. [33]. Median values in bold type are interpreted by 
international experts as the most likely values taking account of uncertainties 
about the nature of vegetation, topographic irregularities, etc. 


(Sub)section 

Position 

X component 

Marginal hyperparameters 

Tonneins 





i 

La Reole 

Tonneins 

foodplain 

Y 

s,maj 

/^maj 

^maj — 4.1 

i 

Aval de Mas d’Augenais 

riverbed 

-^s,niinTA 



i 

Amont de Marmande 

riverbed 

Xs,minAA 

/^miriAA — 

O'minAA ~ '^■1 

i 

La Reole 

riverbed 

Xg^minAL 

l^minAL ~ 40 

min AL =7.1 


Table 4: Detailed meanings and prior modelling for each component of X (La 
Garonne roughness coefficients). The riverbed roughness coefficients are differ¬ 
entiated between the measuring stations listed in the first column. A virtual 
size 0 = 1 was chosen for each dimension. 
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Figure 1: Standard maxzmm-LHD, ECD design, WIMSE designs of experiments 
with a = 1,0.8, 0.5,0.2 and MMSE design (two-dimensional toy example). 
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Figure 2: Posterior distributions of 6 with benchmark, standard maximin-UlD, 
MMSE design and WIMSE designs with a = 1,0.8, 0.5,0.2 (two-dimensional 
toy example). 
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Figure 3: Posterior distributions of 0 with benchmark, standard maa:fmm-LHD, 
MMSE design and ECD design (two-dimensional toy example). 


La Reole 



Figure 4: Riverbed profile of French river La Garonne. 
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Vegetation 



Sand and pebbles at 
the bottom of the river 

Riverbed 


Floodplain 


Figure 5: Cross-section of a classical river. 



Figure 6: Uncertainty over the estimators of Manning coefficient (M = 1/X), 
from of Engineers ^7}. Cited (Fig. 3.5) in Liu [^ . 
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Figure 7: Approximations of the marginal posterior distributions of 9 for several 
sizes N maximin-ljHD and two encompassed observation datasets (n = 10 then 
n = 50) using the MASCARET computer code. 
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Figure 8: Comparison of the quality of different designs of numerical experi¬ 
ments using the MASCARET computer code. 
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Figure 9: Approximations of the marginal posterior distributions of 6 (first four 
dimensions) produced by several designs using the TELEMAC-2D computer 
code. The red stars indicates the prior means for each parameter. 
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